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Inferential Statistics 





e Make inferences about a population 
e Research Phenomena 


e Make decisions 
e Model the data 
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Descriptive statistics 
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Formulate research question (RQ) 
State H € H relative to the RQ 

Determine a, f, & sample size (n) 
Choose Statistics Test- Z, t, X2, ... 


Analyze data - test statistic & p-value 
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* Identify or understand the problem 


e Make statement about a relationship 
o between two or more variables 





The Path to a Research Question: 
e Quantitative or Qualitative From Broad Topic to a Specific Question 


e Clear, focused, concise, testable Topic 


o ‘Does x have an effect on Y...?” ...instead of ‘How does... ?” 
(narrowed down through preliminary 
research) 


Working questions 
Drobliematuzator 





2. Formulate Hypotheses 





e an assumption about a population parameter 
o mean, proportion, standard deviation 
o identifiable by analysis 
o extinct sample statistics 


e a prediction of an expected outcome 
o null or alternative hypothesis STEPS IN HYPOTHESIS TESTING 


o specific, testable, measurable 





State Hypothesis 


o educated, informed, prediction cn 





Calculate Test Statistic | 








e the assumption of absence ... 
o no difference, no relationship, no change 
o must be rejected if no statistical significance 
o innocent until proven quilty 


Do smokers weigh the same as non-smokers? 


e starting point or benchmark of analysis 





Null Hypothesis (H,): the average weight does not differ 


e status quo, current system of belief 


Ho: Mean Wot. smokers - Mean Wot. Non-smokers 








Alternative Hypothesis (H,): the average weights differ 


Ha: Mean Wat. Smokg A Mean Wot. Non-smokers 









e definite statement that a relationship exists 
o Ha is usually the opposite of the Ho 


e research hypothesis- challenges the status quo 


e accepted if results are statistically significant "E 
o The 12°" graders SAT scores are lower than that of the 10 graders » 4 ... £ 
o There isa relationship between reaction time and problem-solving ability (PROBLEM) 
> 
Hypotheses 
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3. Level of Significance 





e designated as (a) alpha probability 
o critical value of test statistic (.01, .05, .1) 
o the threshold probability of a Type 1 error 


e for evaluating the p-value 


o probability of the outcome if Ho is true 
o the probability of the outcome (by chance) 


Probabd ty 


Most likely observation 






Very unlikely 
observations 


Very unlikely 
l-valur observations 







Observed 
data point 





e emi o 
Set of possible results 


A p-value (shaded green area) is the probability of an 
observed (or more extreme) result arising by chance 





eType 1 Error Type I error Type Il error 


(false positive) San og) 


* rejecting a true Ho 


bu ' You're not 
* probability depends on « value 
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eType Il Error 
e fail to reject (or accept) a false Ho 
e probability depend on p value 
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Choice of Statistic 
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scipy.stats.ttest ind() 


scipy.stats.ttest rel() 
scipy.stats.f oneway () 


Comparison of Means- 
One or Iwo groups 


Is the sample 
size above 30? 


Use the T-Test 


Use the Z-Test Use the T-Test 
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Test 1 
Test 2 Independent 


N l Sample t-Test 


ER Cd 
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One Tailed and Two Tailed tests 


One tailed tests: Based on a uni-directional hypothesis 
Example: Effect of training on problems using PowerPoint 


Population figures for usability of PP are known 
Hypothesis: Training will decrease number of problems 
with PP 


Two tailed tests: Based on a bi-directional hypothesis 


Hypothesis: Training will change the number of problems 
with PP 


| | H. is more probable 
0 
H. is more probable 
0 


| | H, is more probable 
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Right-tail test 


Ha: u > value 


Left-tail test 


Ha: u < value 


Two-tail test 


Ha: u # value 





Independent Samples T-Test (Y, Yi )- (4-4) Xi-X: 
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Two different groups/one variable 





Paired Samples T-Test 


Math English o ln 


Same Group/Different Measures 


Comparison of Means 
More than Two Groups 
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INDEPENDENT VARIABLE 


ANOVA is used to compare the mean of >2 groups 
e One-Way: >2 grouhypothypops, 1 quantitative variable 
e Two-Way: >2 groups, split on 2 factors, 1 quantitative variable, 
e Repeated: >2 groups, 1 or more quantitative variable- measured repeatedly 





ANOVA: Analysis of Variance is a variability ratio 


Variance Between + Variance Within = Total Variance 





Two types of mean used in ANOVA: E 
Between 
1. Mean of each sample = hens | = send 
i roun 
2. Grand mean of all the observations l — A 
F—ratio! ! | F—ratio! 


The total variance is made up of: 
1. The variance within the groups 
2. The variance between the groups 





Within 


Sourceof | Sum of Squares 
Variation 
k E 
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At least one group mean dif fers 





K is the number of groups 
nis the total number of observations in all groups 











Current Salary 


Mean Std. Deviation Std. Error 


 829125.60  $10,254.928 $1,872.285 _ 


$53,600.00 $3,781.534  $1,691.153 


95% Confidence Interval for 





$94,616.47 — $22,907.375  $5,914.659 





N 
Clerical 30 
Custodial 5 
Manager 15 
Total 50 


$51,220.30  $33,004.455  $4,667.535 


Current Salary 










Between Groups 
Within Groups 








ANOVA 


Sum of 
Squares 


4.292E410 
1.045E+10 


df 





> 


7 


Mean Square 





222413032.1 


2.146E+10 


Mean 
Lower Bound  UpperBound Minimum Maximum 
$25,296.35 $32,954.85 $10,000 $48,093 
$48,904.61 $58,295.39 $50,000 4 $59,000 
$81,930.79 $107,302.15 $31,555 $123,000 
$41,840.54 $60,600.06 — $10,000 $123,000 





Total 











5.338E+10 

















M Sconditions 
Conditions Senate (k M 1) MSconaitions MS 
error 
M Ssubjeas 
Subjects SS subjects (n — 1) MS subjects A 
error 
Error SS ror (k Sl 1)(n- 1) MSerror 












Variation within 






Variation between 
groups 


groups 


Frequency 


Female 


Descriptive Statistics 


Dependent Variable:Int Politics 


Gender Edu Level | Mean | Std. Deviation RE] 


School 
College 
University 
Total 
School 
College 
University 
Total 
School 
College 
University 
Total 


38.2000 
44.1000 
64.1000 
48.8000 
39.6000 
44.6000 
58.0000 
47.4000 
38.9000 
44.3500 
61.0500 
48.1000 


4.18463 
4.26745 
3.07137 
11.87841 
3.27278 
3.27278 
6.46357 
9.05767 
3.72615 
3.71023 
5.83524 
10.49649 





Two wav ANOVA (without replication 
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Post-Hoc Analysis 


e ANOVA - no identification of which mean differences are significant. 


e Post hoc - explore differences between multiple group means 


o while controlling the experiment-wise error rate 


e Tukey Honestly Significant Difference (HSD) 


O 
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variation of the t distribution, takes the n into account 
HSD is compared with the difference between the means 
mean difference > HSD, then significantly 

most common method of controlling Type | error rates 


e Tukey's HSD 
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Q Tables 


Note: These are abbreviated tables 


in pdfepXJ7Z5yxl. 


1. Alpha = .05 
2. Alpha = .01 


Q critical values for alpha = . 
df are for the Error Term 
k= Number of Treatments 


df} k > 
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Appendix 





e Developed by William Gosset Test 
e Worked in brewery 

e Compare the quality of the barley 
e Invented t-test in 1908 

* Used a pen-name 


e Employers did not want to reveal their use of 
statistics (trade secret) 
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signal _ difference between group means 
noise "© variability of groups 
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Assuming unequal variances, the test statistic is calculated as: 


d = 


df 
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Assuming equal variances, the test statistic is calculated as 





i- T] To 
y (d + i) 
ni na 
Y (z; — #1)? +Y (2; — 22)? 
għa i=] j=l 
ni + na — 2 


Unpaired (Two Sample) t Test 


